Improving statistical computation in R

Ben Bolker

2024-02-27

introduction

should you optimize?

software engineers should worry about other issues (such as good algorithm design and good implementations of those algorithms) before they worry about micro-optimizations” (Hyde 2009)

Do you have other sh*t to do? No? Please contact me and I’ll help you with that. Yes? You are among the 90% of R users whose first priority is not computer programming. The time spent optimizing code is often longer than the computing time actually saved. Use the simple solutions and you can get on with your research/life. (Ross 2013)

  • but faster code enables interactivity, exploration, resampling-based methods …

categories of optimization

  • low-hanging fruit
  • parallelization
  • (memoization)
  • automatic differentiation
  • faster languages

benchmarking and profiling

measuring code performance

  • measure speed before you try to fix it
  • complexity theory/big-O notation (e.g., \(f(x) = {\cal O}(x^{3/2})\) is useful if
    • you have some idea of the scaling constant
    • you plan to scale to large problems (or someone does)
  • scaling dimensions: number of observations, parameters, clusters, dimensions …
  • Amdahl’s Law: speeding up bottlenecks is what matters

benchmarking

  • examples should be large enough not to be swamped by overhead
  • … but small enough to run replicates and average
  • system.time() for quick estimates
  • rbenchmark, microbenchmark packages
  • estimating scaling: theoretical or empirical (log-log plot)

scaling example

Modified from Brooks et al. (2017):

profiling

Example random-walk code (Ihaka, Temple Lang, and McArdle 2009):

rw2d2 <- function(n) {
    steps <- sample(c(-1, 1), n - 1, replace = TRUE)
    xdir <- sample(c(TRUE, FALSE), n - 1, replace = TRUE)
    xpos <- c(0, cumsum(ifelse(xdir, steps, 0)))
    ypos <- c(0, cumsum(ifelse(xdir, 0, steps)))
    return(list(x = xpos, y = ypos))
}

profiling run

  • run multiple times to collect enough data/average over variation
Rprof("Rprof.out") ## start profiling
for (i in 1:100) {
    pos <- rw2d2(1e5)
}
Rprof(NULL) ## stop profiling

profiling results

source("https://raw.githubusercontent.com/noamross/noamtools/master/R/proftable.R")
proftable("Rprof.out", lines = 5)
 PctTime Call               
 38.18   sample > sample.int
 29.09   ifelse > which     
 18.18   ifelse             
  7.27   sample             
  3.64   c                  

Parent Call: rw2d2 > ...

Total Time: 1.1 seconds
Percent of run time represented: 96.4 %

profiling

  • see also summaryRprof() (base-R), profvis package (RStudio fanciness)
  • easiest to interpret if code is organized into functions
  • profiling C++ code called from R is harder but not impossible

low-hanging fruit

don’t write bad code

  • don’t grow objects (chapter 2)
    • pre-allocate (numeric() etc.)
    • create lists and rbind() them together
  • failing to vectorize (chapter 3)
  • over-vectorizing (chapter 4)

Burns (2012): “If you are using R and you think you’re in hell, this is a map for you.”

calculating \(\pi\) by Monte Carlo

bad code

non-vectorized, growing objects

piR_slow <- function(N) {
    res <- numeric(0)
    for (i in 1:N) {
        res <- c(res, as.numeric(runif(1)^2 + runif(1)^2 < 1.0))
    }
    4*sum(res)/N
}

better code

piR <- function(N) {
    x <- runif(N)
    y <- runif(N)
    d <- sqrt(x^2 + y^2)
    return(4 * sum(d < 1.0) / N)
}

benchmarking

rbenchmark::benchmark(
                bad = piR_slow(1e4),
                better = piR(1e4),
                columns = c("test", "replications", "elapsed", "relative"))
    test replications elapsed relative
1    bad          100  10.951  353.258
2 better          100   0.031    1.000

use good packages

  • especially for data handling, I/O
    • data.table > tidyverse > base R
    • data formats: arrow, vroom, direct database access (dbplyr)
  • collapse, xts, Rfast

sum-by-groups benchmark

get a bigger computer

parallelization

parallelization types

  • distributed: “embarrassingly parallel”
  • multicore: multiple R processes
    • each one copies all objects
  • threading: lightweight, shared-memory
  • see also: parallel and HPC task view

distributed computing

  • start individual runs, or chunks (e.g. simulations, bootstraps) as completely separate R jobs
  • Compute Canada (or AWS or a big local workstation)
  • jobs assigned by batch scheduler: High Performance Computing
  • some tools within R: slurm, batchtools, futures.batchtools package

multicore

  • spawn multiple jobs on a single machine (multiple cores)
  • parallel package, foreach/doParallel

threading

  • not available directly through R
  • OpenMP
  • parallel BLAS
    • base R comes with a robust but slow linear algebra library
    • can replace with optimized/parallel versions (command-line bullshittery required (Browne 2021))

BLAS benchmarks

automatic differentiation

AD basics

  • automated chain rule (Griewank 1989; Griewank and Walther 2003)
  • much faster and more robust than finite differences; faster than symbolic derivatives
  • computing gradient takes \(\le 5 \times\) the effort of computing the objective function, regardless of dimension
  • reverse-mode autodiff for \(f: {\mathbf R}^n \to {\mathbf R}\)
  • widely used in modern ML (neural net back-propagation)
  • toolboxes: TensorFlow (PyTorch, RTorch (Keydana 2023)), Stan (Chau 2022), (R)TMB, Julia tools

a side note on optim()

  • if you use a gradient-based optimizer (BFGS, L-BFGS-B) in optim() R will implement finite-difference gradients by default
  • this is terrible (slow and fragile), although fine for simple problems
  • … might as well use gradient-free methods (e.g. Nelder-Mead) for robustness
  • providing gradients (somehow) can make a huge difference for tough optimization problems

AD example

deriv(~cos(log(1+x^2)), "x")
expression({
    .expr2 <- 1 + x^2
    .expr3 <- log(.expr2)
    .value <- cos(.expr3)
    .grad <- array(0, c(length(.value), 1L), list(NULL, c("x")))
    .grad[, "x"] <- -(sin(.expr3) * (2 * x/.expr2))
    attr(.value, "gradient") <- .grad
    .value
})

better AD in R?

  • basic deriv() can only handle basic functions
  • Deriv package
  • TMB/RTMB (C++), Chau (2022) (Stan), autodiffr package (Julia)

integration with lower-level languages

Rcpp

https://csgillespie.github.io/efficientR/performance.html

pi_rcpp <- cppFunction("
double piSugar(const int N) {
    NumericVector x = runif(N);
    NumericVector y = runif(N);
    NumericVector d = sqrt(x*x + y*y);
    return 4.0 * sum(d < 1.0) / N;
}")

benchmarking against previous example

rbenchmark::benchmark(piR(1e6), pi_rcpp(1e6),
                      columns = c("test", "replications", "elapsed", "relative"))
            test replications elapsed relative
2 pi_rcpp(1e+06)          100   1.876    1.000
1     piR(1e+06)          100   3.280    1.748

more on Rcpp

  • lots of examples (gallery)
  • not much faster than already-vectorized code
  • loops don’t hurt!
  • easy to incorporate into packages
  • advanced linear algebra via RcppArmadillo, RcppEigen

References

Brooks, Mollie E., Kasper Kristensen, Koen J. van Benthem, Arni Magnusson, Casper W. Berg, Anders Nielsen, Hans J. Skaug, Martin Mächler, and Benjamin M. Bolker. 2017. glmmTMB Balances Speed and Flexibility Among Packages for Zero-Inflated Generalized Linear Mixed Modeling.” The R Journal 9 (2): 378–400. https://doi.org/10.3929/ethz-b-000240890.
Browne, Kevin. 2021. “Command-Line Bullshittery and Other Realities of Computing.” https://www.kevinbrowne.ca/command-line-bullshittery-and-other-realities-of-computing/.
Burns, Patrick. 2012. The R Inferno. Lulu.com. http://www.burns-stat.com/pages/Tutor/R_inferno.pdf.
Chau, Joris. 2022. “Automatic Differentiation in R with Stan Math.” A Random Walk. https://www.jchau.org/2022/01/24/automatic-differentiation-in-r-with-stan-math/.
Griewank, Andreas. 1989. “On Automatic Differentiation.” In Mathematical Programming: Recent Developments and Applications, edited by M. Iri and K. Tanabe, 6:83–108. Kluwer. http://softlib.rice.edu/pub/CRPC-TRs/reports/CRPC-TR89003.pdf.
Griewank, Andreas, and Andrea Walther. 2003. “Introduction to Automatic Differentiation.” PAMM 2 (1): 45–49. https://doi.org/10.1002/pamm.200310012.
Hyde, Randall. 2009. “The Fallacy of Premature Optimization.” Ubiquity 2009 (February): 1. https://doi.org/10.1145/1569886.1513451.
Ihaka, Ross, Duncan Temple Lang, and Brendan McArdle. 2009. “Writing Efficient Presentations in R (and Beyond).” Taupo, New Zealand. https://www.stat.auckland.ac.nz//~ihaka/downloads/Taupo-handouts.pdf.
Keydana, Sigrid. 2023. Deep Learning and Scientific Computing with R Torch. New York: Chapman; Hall/CRC. https://doi.org/10.1201/9781003275923.
Ross, Noam. 2013. FasteR! HigheR! StrongeR! A Guide to Speeding up R Code for Busy People.” Noam Ross. http://www.noamross.net/blog/2013/4/25/faster-talk.html.